NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Learning from Synthetic Human Group Activities

Chang, Che-Jui; Li, Danrui; Patel, Deep; Goes, Parth; Zhou, Honglu; Moon, Seonghyeon; Sohn, Samuel S; Yoon, Sejong; Pavlovic, Vladimir; Kapadia, Mubbasir (July 2024, The IEEE/CVF Conference on Computer Vision and Pattern Recognition)

The study of complex human interactions and group activities has become a focal point in human-centric computer vision. However, progress in related tasks is often hindered by the challenges of obtaining large-scale labeled datasets from real-world scenarios. To address the limitation, we introduce M3Act, a synthetic data generator for multi-view multi-group multi-person human atomic actions and group activities. Powered by Unity Engine, M3Act features multiple semantic groups, highly diverse and photorealistic images, and a comprehensive set of annotations, which facilitates the learning of human-centered tasks across singleperson, multi-person, and multi-group conditions. We demonstrate the advantages of M3Act across three core experiments. The results suggest our synthetic dataset can significantly improve the performance of several downstream methods and replace real-world datasets to reduce cost. Notably, M3Act improves the state-of-the-art MOTRv2 on DanceTrack dataset, leading to a hop on the leaderboard from 10th to 2nd place. Moreover, M3Act opens new research for controllable 3D group activity generation. We define multiple metrics and propose a competitive baseline for the novel task. Our code and data are available at our project page: http://cjerry1243.github.io/M3Act.
more » « less
Full Text Available
The IVI Lab entry to the GENEA Challenge 2022 – A Tacotron2 Based Method for Co-Speech Gesture Generation With Locality-Constraint Attention Mechanism

https://doi.org/10.1145/3536221.3558060

Chang, Che-Jui; Zhang, Sen; Kapadia, Mubbasir (November 2022, GENEA Challenge 2022)

Full Text Available
The Importance of Multimodal Emotion Conditioning and Affect Consistency for Embodied Conversational Agents

https://doi.org/10.1145/3581641.3584045

Chang, Che-Jui; Sohn, Samuel S; Zhang, Sen; Jayashankar, Rajath; Usman, Muhammad; Kapadia, Mubbasir (March 2023, Intelligent User Interfaces 2023)

Full Text Available
Disentangling audio content and emotion with adaptive instance normalization for expressive facial animation synthesis

https://doi.org/10.1002/cav.2076

Chang, Che‐Jui; Zhao, Long; Zhang, Sen; Kapadia, Mubbasir (June 2022, Computer Animation and Virtual Worlds)

Full Text Available

Search for: All records